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Abstract. Gravitational waves from the inspiral and coalescence of supermas- 
sive black-hole (SMBH) binaries with masses m\ ~ mi ~ 10 6 are likely to be one 
of the strongest sources for the Laser Interferometer Space Antenna (LISA). We 
describe a three-stage data-analysis pipeline designed to search for and measure 
the parameters of SMBH binaries in LISA data. The first stage uses a time- 
frequency track-search method to search for inspiral signals and provide a coarse 
estimate of the black-hole masses m\ , m,2 and of the coalescence time of the binary 
t c . The second stage uses a sequence of matched-filter template banks, seeded 
by the first stage, to improve the measurement accuracy of the masses and coa- 
lescence time. Finally, a Markov Chain Monte Carlo search is used to estimate 
all nine physical parameters of the binary (masses, coalescence time, distance, 
initial phase, sky position and orientation). Using results from the second stage 
substantially shortens the Markov Chain burn-in time and allows us to determine 
the number of SMBH-binary signals in the data before starting parameter esti- 
mation. We demonstrate our analysis pipeline using simulated data from the first 
LISA Mock Data Challenge. We discuss our plan for improving this pipeline and 
the challenges that will be faced in real LISA data analysis. 
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1. Introduction 

There is compelling evidence from electromagnetic observations that the cores of 
galaxies contain supermassive black holes (SMBHs) SMBH binaries can form 
after galactic mergers as the black holes from the individual galaxies fall to the center 
of the merged system and form a bound pair. Hierarchical-merger models of galaxy 
formation predict that SMBH binaries will be common in galaxies [2j [3] and the 
presence of one such binary has been inferred from X-ray measurements of the core of 
the galaxy NCG 6240 3]. The evolution of an SMBH binary will eventually be driven 
by radiation reaction from the emission of gravitational waves (GWs) and the binary 
will inspiral and merge to form a single SMBH. The GWs from inspirals of SMBH 
binaries with component masses m in the range m ~ 10 4 -10 7 Mq will be one of the 
strongest sources for LISA, the planned space-based GW detector [51 [6] . The direct 
detection of SMBH binaries will be of wide astrophysical relevance, for example by 
probing the merger rates and histories of galaxies [7], or by providing cosmological 
standard candles [5]. 

Searching for SMBH binary inspiral signals is expected be one of the more 
straightforward tasks in LISA data analysis. The velocities of the black holes during 
the inspiral are v/c <C 1, and so existing post-Newtonian waveforms [9l [10] will 
describe the gravitational waveforms with sufficient accuracy for use as templates 
in a matched-filter search [11 . As such, searches for SMBH binaries in LISA data 
will be similar in nature to existing searches for binary-neutron-star (BNS) inspirals 
in ground-based GW detectors, such as the Laser Interferometer Gravitational-wave 
Observatory (LIGO) [12] . However, there are several key differences between LIGO 
and LISA binary inspiral searches. First, the LIGO pipelines are designed to search 
for signals with expected signal-to-noise ratios (SNRs) < 10, whereas the SNR of LISA 
SMBH binaries at distances z < 2 is expected to be several hundred or more. Second, 
the BNS signals sweep through the sensitive frequency band of ground-based detectors 
on timescales of order a minute, during which detector velocities and orientations can 
be considered as fixed to high accuracy. By contrast, LISA will be able to observe a 
single SMBH inspiral for weeks to months. During that time, the LISA velocity and 
orientation change appreciably, inducing modulations in the recorded signal. Indeed, 
almost all the information about an SMBH binary's sky location and orientation is 
encoded in these modulations. (In the ground-based case, a network of three or 
more widely separated detectors is required to determine a binary's sky location by 
triangulation between the times of arrival of the GW signals at the different detector 
locations.) Finally, whereas the rate of BNS inspirals in ground-based detectors makes 
it unlikely that multiple signals will be observed concurrently, LISA data may contain 
simultaneous signals from a few different SMBH binaries. 

Existing search pipelines developed for ground-based observations of stellar- 
mass binary inspirals can achieve high detection efficiency already at SNRs ~ 
10 [13l HH [15l [16], so the task of detecting SMBH inspirals with LISA seems easy 
in comparison. Furthermore, since SMBH binaries at z ~ 1 have such high SNR, and 
because of LISA'S relatively wider frequency band (roughly three orders of magnitude 
for LISA, compared to two for LIGO), it should also be possible to determine the 
masses and spins of the binaries with significantly higher accuracy in the LISA case 
than for ground-based detections. Fisher-matrix calculations suggest that, for SMBHs 
detected at z ~ 1, LISA should be able to determine the chirp mass to relative accuracy 
~ 10~ 5 , both individual masses to ~ 10" 3 and the SMBH spins to ~ 10~ 3 -10~ 2 [17]. 
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Indeed, the goal of our data-analysis pipeline is not only to detect the SMBH signals, 
but also to provide accurate measurements of the binary parameters. 

Based partly on the considerations discussed above, our group has adopted the 
following three-stage search method. Low-z SMBH binary inspirals are so bright 
that they are easily visible as tracks in time-frequency (TF) spectrograms. Therefore 
our first stage consists of a search for such TF tracks; the shape and location of 
the track yields a first estimate of the two masses, mi and m,2, and the coalescence 
time, t c . The second stage is a set of more refined grid-based matched-filter searches 
that start in a neighborhood of the best-fit parameters found in the first stage; these 
searches home in on more accurate values for the three parameters mi, 7712 and t c . The 
final stage is currently a straightforward implementation of a Markov Chain Monte 
Carlo (MCMC) simulated-annealing search for the best-fit parameters in the full nine- 
dimensional parameter space (including also the binary's luminosity distance, initial 
phase, inclination, polarization, ecliptic latitude and longitude). 

There are a few reasons for adopting such a complicated algorithm. First, we 
believe that the capability of looking for TF tracks is a very useful one to develop 
in the LISA context: it is possible that there will be tracks that do not follow the 
expected chirping pattern, and so would not be found by more sophisticated (grid- 
based or MCMC) methods, even though they are visible to the eye. The track-search 
method also allows us to count the number of SMBH binary signals present in the data 
before attempting parameter estimation. Second, the grid search is useful to make sure 
that we do not miss any binary sources, by examining the entire parameter space. In 
the pipeline described here, however, we did not cover the entire parameter space 
in our grid search; rather, we seeded the second-stage search using the parameters 
obtained from the first stage. In future implementations, we intend to compare the 
full grid search to this method. Finally, the MCMC approach is clearly very adept at 
obtaining the final parameter estimates. 

We have tested the performance of our SMBH binary search pipeline using data 
from the Mock LISA Data Challenges (MLDCs) [HI US]. The MLDCs are a program 
sponsored by the LISA International Science Team to foster the development of LISA 
data-analysis methods and tools, and to demonstrate already acquired milestones in 
the extraction of science information from the LISA data output. In the MLDCs, 
GW signals whose parameter values arc unknown to the challenge participants are 
embedded in synthetic LISA noise; participants are challenged to identify the signals 
and extract their parameters. Challenges of increasing difficulty are being issued 
roughly every six months. The results from the first Challenge are summarized by 
Arnaud and colleagues in this volume [20]. Challenge 1 included two datasets with 
signals from isolated SMBH systems; we analyzed one of them. One of the goals of 
the MLDCs is to demonstrate that data-analysis pipelines can actually achieve the 
fantastic parameter measurement accuracy predicted by the Fisher-matrix analysis. 

Two other differences between the ground-based and space-based cases deserve 
mention. First, SMBH binaries may enter the LISA band with considerable 
eccentricity, whereas the BNSs observed by ground-based detectors will have become 
essentially circular by the time they enter the observation band. Second, in the 
ground-based case the binary-inspiral signals are immersed in noise that originates 
almost entirely from the instrument, while through much of LISA's sensitivity band 
the dominant noise comes from unresolved Galactic white-dwarf binaries. To keep 
Challenge 1 relatively simple, however, these last two complications were omitted in 
creating the synthetic datasets, and hence from our initial pipeline described here. 
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The rest of this paper is organized as follows: in sections[2H3]we describe the three 
stages in our SMBH binary data-analysis pipeline: a track search in the time- frequency 
plane, a grid-based matched filtering search, and Markov Chain Monte Carlo; in [5] we 
present the results of analyzing the MLDC dataset 1.2.1; and in[5]we discuss our plans 
for improving the pipeline to cope with issues such as binary eccentricity and the noise 
sources likely to be observed in real LISA data. 

2. Stage 1: Search for tracks in the time frequency plane 

The TF spectrogram contains enough information to identify an SMBH binary inspiral 
at a high SNR. The techniques described below make it possible to quickly search for 
the presence of an SMBH binary inspiral in the signal and to get rough estimates of 
its parameters. 

Challenge 1 includes signals from the adiabatic inspiral of a circular binary system 
of nonspinning SMBHs. The frequency evolution of these inspirals is given by (7.11a) 
of [21] in terms of the time of coalescence t c and the two SMBH masses mi and mi. 
We write it here as a function of the symmetric mass ratio r\ = m\mi/ (m\ +m2) 2 and 
the chirp mass M c = (mi + m2)j) 3 ' 5 , using the second-order post-Newtonian (2PN) 
approximation: 

(Tc _ T) -l/4_^ (Tc _ T) -3/8 (1) 
(Tc _ T) -5/8 | 

Here /gw is the GW frequency in Hz, M c is expressed in seconds, and T is the 
dimensionless time variable related to coordinate time t by T — trf^/(hM c ). 

We create a TF map of the noisy data stream s(t) = h(t) + n(t) [in fact, one of 
the Time-Delay Interferometry (TDI) channels X(t), Y(t) and Z(t) provided in the 
MLDC datasets], sampled with timestep dt, in two passes. On the first pass, we split 
up the data stream into time bins of equal duration At. The TF spectrogram will then 
consist of pixels of size At x Af, where Af = I /(At). We determine the normalized 
power contained in each pixel with a Fast Fourier Transform (FFT), normalizing by 
the power spectral density of the noise, and then find the peak frequency in each bin 
by searching for the loudest pixel (see below for details). The resulting set of {time, 
frequency} pairs allows us to search for an inspiral track on the TF map (see figure 
[1} . Once such a track is identified, we make a second pass through the data, iterating 
through the track region with time bins of varying duration to create an improved 
TF map. Earlier in the track, a larger At helps to detect a weak signal and achieve 
greater frequency resolution; closer to coalescence, a smaller At reduces the error in 
estimating the rapidly chirping GW frequency. 

In fact, we have made several improvements to the general approach outlined in 
the previous paragraph. The first set of improvements concerns the determination 
of the peak frequency in a given time bin. Simply searching for the loudest pixel 
would give frequency-determination errors of order 1/(A/), even for a noiseless 
signal. Instead, we achieve higher accuracy by modeling the bleeding of frequency 
into neighboring pixels: specifically, we determine the peak frequency by fitting the 
logarithm of power in the pixels nearest to the brightest pixel to a parabola, using zero- 
padding in the time domain to achieve better frequency resolution when necessary. We 
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Table 1. Parameters extracted via TF searches from the X, Y and Z channels 
of blind Challenge dataset 1.2.1. N is the number of data points obtained during 
the second pass through the data and S is the sum of the squares of the residuals, 
as defined in (f2j) . 



N E/1CT 11 M c /(10 6 M o ) r, t c /(10 7 s) 



X 156 9.2 
Y 190 9.21 
Z 192 11.5 



1.2096 
1.2033 
1.2099 



0.182 1.3373 
0.139 1.3370 
0.183 1.3373 



also apply a Hanning window to the signal prior to taking the FFT, and we overlap 
time bins to avoid information loss from windowing. 

Another improvement concerns the variable timestep and the selection of outliers 
on the second pass through the data. If the peak frequencies of neighboring time bins 
differ by more than 2A/, we decrease At by a pre-set factor (say, 1.5) to reduce the 
sweep of frequency in each bin. If this operation fails to bring the peak frequencies 
closer together, we declare the data point an outlier, and skip to the next bin. 

The {time, frequency} data points obtained on the second pass serve as inputs 
to a MATLAB least-squares fitting algorithm that extracts the inspiral parameters t c , 
M c and r\ by fitting these points to the model of ([J) (see figure [2]). Specifically, we 
find the values t c , M c and f] that minimize the sum 



where the f, are the centers of the output time bins, f(U) are the associated frequencies, 
and /gw(*i) tci M c , v) is the model from (JTJ) . 

Although one could weight the data points on the basis of the signal amplitude, 
such a weighting seems to carry little benefit: late in the inspiral, the increased 
amplitude offers greater SNR, which is however substantially offset by poorer 
frequency determination (due either to frequency drift within each time bin if At 
is not properly adjusted, or to low frequency resolution if it is). 

Table [1] shows the results of the TF search on the blind Challenge dataset 1.2.1. 
After averaging results from the three TDI streams, we found M c = 1.208 x 10 6 M Q , 
f] = 0.17 and t c — 1.3372 x 10 7 s. The accuracy of these estimates is discussed in 
Sec. V below; suffice it to say that these first-stage results were certainly accurate 
enough for our purpose. 

3. Stage 2: Grid-based search 

The grid-based part of the search relies on the template placement algorithm of 
Babak et. al. [13] and the findchirp matched filtering algorithm of Allen et. al. [14], 
both of which were developed for the LIGO binary neutron star searches. The basic 
algorithm is as follows: a grid of templates is constructed in the (toi,TO2) plane 
using the metric-based square-grid placement algorithm [221 113j implemented in the 
LIGO Algorithm Library (LAL) |23j$l . The fineness of the grid is specified by its 

| Babak et. al. also describe a more efficient hexagonal placement algorithm, however we were unable 
to place templates for LISA SMBH binaries using the LAL implementation of this algorithm. We 
intend to work with the authors of the LAL code to resolve this. 
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Figure 1. Time— frequency plot of the brightest pixel in each time bin, as 
computed for the X channel of Challenge 1 training set 1.2.1. The bottom plot is 
a blown-up version of the top plot showing the presumed track found on the first 
pass through the data. 
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time [s] 

Figure 2. The stars represent individual points on the TF map obtained during 
the second pass through the data in the X(t) channel of where Training Set 1.2.1. 
The curve is the result of fitting these points to the model (Q. 
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minimum-match parameter MM, which is the minimum overlap between any point in 
the parameter space and its nearest grid-point. To implement the algorithms described 
in [T3], we have written C code which implements the matched filtering algorithms 
and template generation. These C functions are then "wrapped" by the Simplified 
Wrapper Interface Generator (SWIG), which allows them to be called from the Python 
high-level programming language. This approach allows us to rapidly prototype and 
develop the procedure described below. 

For each mass pair in the grid, we compute a (Fourier-transformed) waveform 
h(f) (corresponding to coalescence at t = 0), using 2PN waveforms and the stationary 
phase approximation (SPA) [24]. We transform from h(f) to the LISA TDI variable 
X h (f) using 

X h (f) = sin 2 (2nfL)h(f), (3) 

where L is the LISA arm length. Let the (Fourier transformed) data be X s (f); then 
for each template waveform Xh(f) in our grid we use the FFT to compute the inverse 
Fourier transform 

Z(t) = / 1S Sf/) (/) e2 " /d/ ' (4) 

and we maximize \z(t)\ over t to estimate the time of coalescence. We identify the 
best-fit point in the (mi, 7712) plane, and then repeat the search in a neighborhood 
of that point, with a finer grid. We do this four times, with a final minimum-match 
parameter MM = 0.995. For Challenge 1.2.1, based on the results from the TF stage 
(mi w 2.9 x 10 6 M© and mi ~ 7.3 x 10 5 Af©), we chose our initial grid to cover the 
portion of the (mi, niq) plane satisfying 6 x 10 5 < < mi < 3.2 x 1O 6 M , with 
initial MM = 0.30. 

Now, our parameter-estimation errors are dominated not by the coarseness of 
the grid, but by the fact that our 2PN SPA waveforms are not identical to BBH 
waveforms injected into the Mock LISA data, even for the same parameter values. 
Our 2PN SPA waveforms differ from the MLDC versions by higher-order PN terms, 
and do not include the modulations due to the detector motion. They are also simply 
cut off at the frequency of the innermost stable circular orbit (ISCO) of a test mass in 
the Schwarzschild spacctime, while the MLDC waveforms end with a very particular 
choice of taper. Therefore we do one final grid search using MLDC waveforms (again 
with MM = 0.995), and for some particular choice of the five angles (d,(/),L,ip,(po)- 
Although these angles are wrong, in this step the other features of the templates 
(e.g., the Doppler modulation of the frequency due to LISA's orbit and the amplitude 
taper) do match those of the injected MLDC binary waveforms, and so presumably 
yield improved parameter estimates. 

4. Stage 3: Markov Chain Monte Carlo 

So far, the first two stages have given estimates only of the two masses and coalescence 
time; in addition, the stage-2 analysis was based only on the X channel. Thus, we 
rely on the MCMC stage to find the distance, sky location, and the polarization and 
inclination angles of our source. A more efficient way to do this would be to use the 
J-"-statistic [25, 26 to automatically optimize over four amplitude parameters that are 
functions of distance, polarization, inclination and initial phase; however, we did not 
have time to implement this procedure for Challenge 1. Therefore our MCMC code 
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does a brute force search over all parameters — but with the advantage that it starts 
in the right vicinity for the masses and coalescence time, as estimated in the first two 
stages. 

MCMC approaches have shown promise in the extraction of GW-source 
parameters with LISA [27l [H [U [30l EU [32] . Nevertheless, it has been suggested 
that, for SMBH binaries, MCMC searches over a full parameter set need to be started 
in a neighborhood of the correct source parameters to efficiently characterize the 
posterior probability density functions [28]. Since the initial search grid provided a 
good estimate of three parameters (the constituent masses and coalescence time t c ), 
and since it is trivial to extremize analytically over the luminosity distance, we were 
hopeful that we could determine the values of the sky location and binary orientation 
with a straightforward implementation of the Metropolis-Hastings Algorithm (MHA). 
Since time was limited and posterior distributions were not required for Challenge 1, 
we chose not to estimate these, but rather to use the MHA to locate the best-fit 
parameters. 

In the MHA, a Markov chain is built by accepting a new proposed point with 
probability a — min(l, H); H is the Hastings ratio for a jump from position x to y in 
parameter space, given by 



where p{x) is the prior distribution, p(s\y) is the likelihood of the parameter set y 
producing the signal s, and q(x\y) is the proposal distribution used to generate the 
move from x to y. If the noise is a normal process with zero mean, the likelihood is 
given by 



with "('I')" the standard inner product computed with respect to the LISA instrument 
noise. 

The Markov chain process is guaranteed to converge to the posterior probability 
distribution if the proposal distribution is nontrivial; however, the speed of convergence 
does depend on its choice. In this search we adopted two types of proposals: 
the first consisted of a multivariate normal distribution with jumps directed along 
the eigendirections of the Fisher information matrix, computed locally; the second 
amounted to drawing parameters from uniform distributions. For the angular 
parameters, both timid and bold draws (from small or large ranges) were made to 
ensure we were fully exploring parameter space; for the component masses, only timid 
draws (< 1%) were used. 

Multiple concurrent chains were started using the parameter estimates obtained 
in stage 2. These were run on a supercomputing cluster with 3.2 GHz Intel Pentium 
4 processors, using Synthetic LISA [33] to reproduce the LISA response to the SMBH 
binary waveforms. Each run was limited to 12 hours, providing ~ 3, 500 steps in 
each of the chains. The most promising candidates at the end of the first run were 
used as the starting locations of a second run. At the end of the first run the best 
candidates had reached log likelihood values in the neighborhood of 200,000; the 
second run saw them increase to ~ 205, 000. The chains converged around two points 
in parameter space, differing by their locations on opposite sides of the sky. This was 
not unexpected: dual maxima at antipodal sky positions are a well-known degeneracy 
for LISA sources. Our choice between the two final parameter sets was based on a 
visual comparison of the putative signals with the challenge dataset. 
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In future implementations of the pipeline, we plan to incorporate the ^"-statistic 
in the MCMC stage to reduce the size of parameter space. This will increase search 
efficiency and relax the need to begin the search in a neighborhood of the best-fit 
parameters (something that will be necessary when searching for the dimmer SMBH 
binaries of Challenge 2). Another time-saving measure will be to start the search on 
a limited portion of the data stream, and then steadily increase its size. This process, 
called frequency annealing [34] . allows a quick initial exploration of parameter space, 
and a careful later investigation of the exquisitely sharp likelihood peaks close to bright 
SMBH binaries. 

5. Results for MLDC Challenge 

As was the case for many Challenge-1 participants, the Dec. 3, 2006 submission 
deadline arrived before our pipeline was fully ready; nevertheless we decided to submit 
our best estimates for the parameters of the blind dataset 1.2.1. This dataset consisted 
of the three TDI unequal-Michelson channels X(t), Y(t) and Z(t). In stage 1 of our 
search, we analyzed each of these channels separately, and simply averaged the three 
results to arrive at the stage-one parameter estimates shown in the fourth column of 
table [21 In stage 2, only the X(t) data was analyzed (partly because of time pressure). 
In stage 3, we analyzed two orthogonal TDI channels given by AT and (A + 2F)/\/3. 

The true signal parameters were made publicly available on Dec. 4, and here we 
briefly describe how our search fared in their recovery. The injected signal had a 
combined^ (A + E) SNR of 667.734; its true physical parameters are listed in the third 
column of table Our best-fit waveform matched the true waveform rather well: it 
had an SNR of 664.47 and its cross-correlation with the true waveform was 0.994 for 
the A channel and 0.996 for the E channel [35]. The quality of the fit is illustrated 
in figure [3j which compares the true X(t) (produced by us from the key file) with 
our best-fit X(t), for short time stretches near the coalescence time t c and near the 
beginning of the dataset. Clearly our fit is excellent near t c , where most of the SNR 
accumulates, but is much poorer at early times, when the contribution to the SNR 
is much lower. The lesson from the other two Michelson variables is qualitatively the 
same. 

Our best- fit parameters are listed in the last column of table [5J our inferred chirp 
mass M c was correct to within AM C /M C < 10 -3 , our inferred symmetric mass ratio 
r] to within Ar/ w 4 x 10~ 3 , and the error in our coalescence time was At c ~ 45 s, 
corresponding to approximately 0.05 GW periods just before the plunge. Nevertheless, 
it is clear from our estimates for the other parameters that, instead of converging on a 
neighborhood of the true maximum, our MCMC code locked onto a high but secondary 
maximum of the posterior probability distribution. Our inferred sky position is almost 
at the antipodes of the actual location (i.e., our ecliptic latitude is approximately the 
negative of the true value, and our ecliptic longitude is off by nearly it). This was 
not due to a mismatch of conventions or a bug in our code; rather, it reflects the 
above-mentioned degeneracy between antipodal sky locations (the degeneracy becomes 
perfect in the low-frequency limit). The four parameters (D,o,ip,ipo) that determine 
the overall complex amplitudes of the GW polarizations h+ and h x were also off by 

§ In this context, A and E are the orthogonal, optimal TDI observables given by (2X — Y — Z)/3 
and (Z — y)/\/3, as used in [20] . The third orthogonal, optimal TDI observable, T, contributes only 
a tiny fraction of the total SNR for these sources. 



A Three-Stage Search for Supermassive Black Hole Binaries in LISA Data 



10 



Table 2. True values and estimates from three steps for the challenge parameters. 
In stages 1 and 2 estimates were made only for parameters M c and 77 (and therefore 
mi and m^) and t c . 



Parameter 


Unit 


True value 


Stage 1 


Stage 2 


Stage 3 


M c 


10 6 Mq 


1.2086 


1.208 


1.2108 


1.2077 


V 




0.160 


0.17 


0.163 


0.156 


mi 


10 6 Mq 


2.8972 


2.74 


2.8536 


2.9652 


1712 


10 6 Mq 


0.7270 


0.76 


0.7381 


0.7130 


tc 


10 7 s 


1.3374027 


1.3372 


1.3374149 


1.3374072 


Eel. Lat. 9 


rad 


-0.492 






0.536 


Eel. Long. <f> 


rad 


0.866 






4.039 


Pol. Angle 


rad 


3.234 






5.886 


Init. Phase ipo 


rad 


3.527 






0.233 


Distance D 


10 9 pc 


8.000 






16.811 


Incl. Angle 1 


rad 


1.944 






0.617 



factors of order one, except for our overall phase ipo, which was correct to within 0.004 
radians (modulo tt). 




X 10 7 

1.3360 1.3365 1.3370 1.3375 1.3380 



time [s] 



Figure 3. Comparison of our best-fit X(t) to the true X(t) for a) a short stretch 
of time near t c and b) a short stretch near the beginning of the dataset. Clearly, 
our fit is excellent near t c , where most of the SNR accumulates, but much poorer 
at early times. 



It is also instructive (and reassuring) to contemplate the performance of the first 
two stages of our search. Stage 1 returned M c with a fractional error AM C /M C < 10 -3 , 
r\ to within ~ 6%, and t c to within ~ 2x 10 3 s. After stage 2, the estimated M c was in 
fact slightly worse, but the errors in 77 and t c were significantly reduced, to A77 w 0.003 
and At c sa 120 s. This gratifying level of accuracy indicates that the coarser stages 1 
and 2 were indeed accomplishing the job required of them. 



6. Future Directions 



As explained above, the most obvious improvement to our pipeline will be to recast 
the MCMC stage so that it maximizes the ^-statistic on the 5-dimensional space 
(M c ,r],t c ,8, 4>), reducing the search-space dimensionality by three. In addition, we 
will extend our grid search to handle the case where the merger occurs after the end 
of the dataset (we did not compete on dataset 1.2.2 because our current grid search 
could not handle such mergers). This generalization should be fairly straightforward. 
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In the second round of Challenges (see the proceeding by Arnaud and colleagues 
in this volume [35]), dataset 2.2 contains signals from an entire Galaxy's worth of 
white-dwarf binaries, four to six SMBH binary inspirals (the exact number is not 
specified) with SNRs ranging from ~ 10 to ~ 2000, and five EMRIs. Our plan is to 
first run our pipeline as a standalone search for the SMBH binaries, and then to join 
forces with Crowder and Cornish's WD binary search |32j to iteratively improve the 
fits provided by the two searches. Beyond that, we plan to extend the SMBH binaries 
search to include: 1) merger and ringdown waveforms; 2) spin-precession effects; and 
3) the effects of nonzero eccentricity. For the first two items, we intend to make 
use of the technology already developed by the ground-based GW community. For 
instance, Buonanno, Chen, and Vallisneri [36] have shown how searches for binaries of 
spinning BHs can be made considerably more efficient by dividing the parameters into 
intrinsic (such as the masses) and extrinsic (such as the orientation of the orbital plane 
at a fiducial time), and optimizing over the extrinsic parameters semi-analytically. 
(This can be viewed as a generalization to spinning binaries of the ^-statistic analysis 
mentioned above.) We shall endeavour to generalize this strategy to LISA searches 
for SMBH binaries. 
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